A New Language for Constraint Grammar: Estonian∗
نویسندگان
چکیده
The Constraint Grammar of Estonian presented in the paper is the first attempt in automatic syntactic analysis of Estonian. The grammar consists of 1,240 morphological disambiguation rules, 47 clause boundary detection rules, 180 morphosyntactic mapping rules and 1,118 syntactic constraints. The rules have been devised using a training corpus of 20,300 words and have been tested on a benchmark corpus of 10,000 words. As the result of tests, 86.6% of words become morphologically unambiguous, and the error rate of the morphological disambiguator is 1.8%. The results of the full analysis demonstrate the ambiguity rate of 83% and error rate
منابع مشابه
Shallow Parsing of Spoken Estonian Using Constraint Grammar
In this paper we describe how we have adapted the syntactic analyzer of written Estonian to the spoken language. The Constraint Grammar shallow syntactic parser (Müürisep et al. 2003) was used for the automatic syntactic analysis of the corpus of Estonian spoken language (Hennoste et al. 2000). To adapt the parser, the clause boundary detection rules as well as some syntactic constraints had to...
متن کاملSyntactically annotated corpora of Estonian
Syntactically annotated corpora are needed 1) to train and test parsers and various language technological products grammar checkers, information retrievers and extractors, machine translators etc; 2) to check the agreement of existing linguistic theories with the real language usage. The corpora can be annotated on different levels of depth. In shallow syntactically annotated corpora a syntact...
متن کاملDisfluency Detection and Parsing of Transcribed Speech of Estonian
The paper introduces our strategy for adapting a rule based parser of written language to transcribed speech. Special attention has been paid to disfluencies (repairs, repetitions and false starts). A Constraint Grammar based parser was used for shallow syntactic analysis of spoken Estonian. The modification of grammar and additional methods improved the recall from 97.5% to 97.7% and precision...
متن کاملParsing Estonian with Constraint Grammar
This paper describes the current state of syntactic analysis of Estonian using Constraint Grammar, focusing mainly on the determination of syntactic functions. Constraint Grammar of Estonian was written in 1996-2000 at the University of Tartu. The author has developed its syntactic part.
متن کاملDetermination of Syntactic Functions in Estonian Constraint Grammar
This article describes the current state of syntactic analysis of Estonian using Constraint Grammar. Constraint Grammar framework divides parsing into two different modules: morphological disambiguation and determination of syntactic functions. This article focuses on the last module in detail. If the morphological disambiguator achieves the precision more than 85% and error rate is smaller tha...
متن کامل